64 research outputs found

    ESTER: efficient search on text, entities, and relations

    Get PDF
    We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graphpattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB

    Permafrost biases climate signals in δ18Otree-ring series from a sub-alpine tree stand in Val Bever/Switzerland

    Full text link
    During recent decades, stable oxygen isotopes derived from tree-ring cellulose (δ18OTRC) have been frequently utilised as the baseline for palaeoclimatic reconstructions. In this context, numerous studies take advantage of the high sensitivity of trees close to their ecological distribution limit (high elevation or high latitudes). However, this increases the chance that indirect climatic forces such as cold ground induced by permafrost can distort the climate-proxy relationship. In this study, a tree stand of sub-alpine larch trees (Larix decidua Mill.) located in an inner alpine dry valley (Val Bever), Switzerland, was analysed for its δ18OTRC variations during the last 180 years. A total of eight L. decidua trees were analysed on an individual base, half of which are located on verified sporadic permafrost lenses approximately 500 m below the expected lower limit of discontinuous permafrost. The derived isotope time series are strongly dependent on variations in summer temperature, precipitation and large-scale circulation patterns (geopotential height fields). The results demonstrate that trees growing outside of the permafrost distribution provide a significantly stronger and more consistent climate-proxy relationship over time than permafrost-affected tree stands. The climate sensitivity of permafrost-affected trees is analogical to the permafrost-free tree stands (positive and negative correlations with temperature and precipitation, respectively) but attenuated partly leading to a complete loss of significance. In particular, decadal summer temperature variations are well reflected in δ18OTRC from permafrost-free sites (r = 0.62, p 0.05). Since both tree stands are located just a few meters away from one another and are subject to the same climatic influences, discrepancies in the isotope time series can only be attributed to variations in the trees’ source water that constraints the climatic fingerprints on δ18OTRC. If the two individual time series are merged to one local mean chronology, the climatic sensitivity reflects an intermediate between the permafrost-free and –affected δ18OTRC time series. It can be deduced, that a significant loss of information on past climate variations arises by simply averaging both tree stands without prior knowledge of differing subsurface conditions

    Travelling on Graphs with Small Highway Dimension

    Get PDF
    We study the Travelling Salesperson (TSP) and the Steiner Tree problem (STP) in graphs of low highway dimension. This graph parameter was introduced by Abraham et al. [SODA 2010] as a model for transportation networks, on which TSP and STP naturally occur for various applications in logistics. It was previously shown [Feldmann et al. ICALP 2015] that these problems admit a quasi-polynomial time approximation scheme (QPTAS) on graphs of constant highway dimension. We demonstrate that a significant improvement is possible in the special case when the highway dimension is 1, for which we present a fully-polynomial time approximation scheme (FPTAS). We also prove that STP is weakly NP-hard for these restricted graphs. For TSP we show NP-hardness for graphs of highway dimension 6, which answers an open problem posed in [Feldmann et al. ICALP 2015]

    Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)

    Get PDF
    In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field

    Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals

    Get PDF
    Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice

    GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture

    Get PDF
    Epilepsy is a highly heritable disorder affecting over 50 million people worldwide, of which about one-third are resistant to current treatments. Here we report a multi-ancestry genome-wide association study including 29,944 cases, stratified into three broad categories and seven subtypes of epilepsy, and 52,538 controls. We identify 26 genome-wide significant loci, 19 of which are specific to genetic generalized epilepsy (GGE). We implicate 29 likely causal genes underlying these 26 loci. SNP-based heritability analyses show that common variants explain between 39.6% and 90% of genetic risk for GGE and its subtypes. Subtype analysis revealed markedly different genetic architectures between focal and generalized epilepsies. Gene-set analyses of GGE signals implicate synaptic processes in both excitatory and inhibitory neurons in the brain. Prioritized candidate genes overlap with monogenic epilepsy genes and with targets of current antiseizure medications. Finally, we leverage our results to identify alternate drugs with predicted efficacy if repurposed for epilepsy treatment

    Dimension reduction: A powerful principle for automatically finding concepts in unstructured data

    No full text
    Dimension reduction techniques have been a successful avenue for automatically extracting the “concepts ” underlying unstructured data, a task that naturally arises in fields as diverse as information retrieval, image processing, social science, etc. It is surprising how much can be achieved for this task using only the raw data itself, without resorting to any additional knowledge or intelligence. We will survey the most important schemes contributed from the various communities to date, by commenting on the following aspects: optimization techniques, the role of normalizations, setting the parameters, computing time, quality of results, and the integration of external knowledge.

    ABSTRACT Why Spectral Retrieval Works

    No full text
    We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resulting curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the termpair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme

    Fast and reliable parallel hashing

    No full text
    A perfect hash function for a (multi)set XX of nn integers is an injective function h:X{1,,s}h:X\to\{1,\ldots,s\}, where s=O(n)s=O(n), that can be stored in O(n)O(n) space and evaluated in constant time by a single processor. We show that a perfect hash function for a given multiset of nn integers can be constructed optimally in O(logn)O(\log^* n) time using O(n/logn)O(n/\log^* n) processors. Our algorithm is faster than all previously published methods. More significantly, it is highly reliable: Whereas analyses of previous fast parallel hashing schemes provided bounds on the expected resource requirements only, our algorithm is guaranteed to stay within the bounds given with overwhelming probability

    Insights from Viewing Ranked Retrieval as Rank Aggregation

    No full text
    We view a variety of established methods for ranked retrieval from a common angle, namely as a process of combining query-independent rankings that were precomputed for certain attributes. Apart from a general insight into what effectively distinguishes various schemes from each other, we obtain three specific results concerned with conceptbased retrieval. First, we prove that latent semantic indexing (LSI) can be implemented to answer queries in time proportional to the number of words in the query, which improves over the standard implementation by an order of magnitude; a similar result is established for LSI’s probabilistic sibling PLSI. Second, we give a simple and precise characterization of the extent, to which latent semantic indexing (LSI) can deal with polysems, and when it fails to do so. Third, we demonstrate that the recombination of the intricate, yet relatively cheap mechanism of PLSI for mapping queries to attributes, with a simplistic, easy-to-compute set of document rankings gives a retrieval performance which is at least as good as that of the most sophisticated conceptbased retrieval schemes. 1
    corecore